Synchronized Expansion and Contraction of Olfactory, Vomeronasal, and Taste Receptor Gene Families in Hystricomorph Rodents

Abstract Chemical senses, including olfaction, pheromones, and taste, are crucial for the survival of most animals. There has long been a debate about whether different types of senses might influence each other. For instance, primates with a strong sense of vision are thought to have weakened olfactory abilities, although the oversimplified trade-off theory is now being questioned. It is uncertain whether such interactions between different chemical senses occur during evolution. To address this question, we examined four receptor gene families related to olfaction, pheromones, and taste: olfactory receptor (OR), vomeronasal receptor type 1 and type 2 (V1R and V2R), and bitter taste receptor (T2R) genes in Hystricomorpha, which is morphologically and ecologically the most diverse group of rodents. We also sequenced and assembled the genome of the grasscutter, Thryonomys swinderianus. By examining 16 available genome assemblies alongside the grasscutter genome, we identified orthologous gene groups among hystricomorph rodents for these gene families to separate the gene gain and loss events in each phylogenetic branch of the Hystricomorpha evolutionary tree. Our analysis revealed that the expansion or contraction of the four gene families occurred synchronously, indicating that when one chemical sense develops or deteriorates, the others follow suit. The results also showed that V1R/V2R genes underwent the fastest evolution, followed by OR genes, and T2R genes were the most evolutionarily stable. This variation likely reflects the difference in ligands of V1R/V2Rs, ORs, and T2Rs: species-specific pheromones, environment-based scents, and toxic substances common to many animals, respectively.


Introduction
The chemical senses play a critical role in the survival of most mammals, enabling them to locate food, find mates and offspring, identify territories, and avoid potential dangers.Mammalian species rely on at least five different multigene families of G protein-coupled receptors (GPCRs) for chemosensation, including olfactory receptors (ORs), vomeronasal receptors, and taste receptors (Table 1; Nei et al. 2008;Niimura et al. 2020).Among these families, the OR gene family is notably the largest, reflecting the vast diversity of odor molecules present in the environment.The number of OR genes encoded in the genome varies significantly across species, with ∼400 in humans, ∼1,100 in mice, and ∼2,000 in African elephants (Niimura and Nei 2003;Niimura 2012;Niimura et al. 2014Niimura et al. , 2018)).The olfactory system operates through a combinatorial coding scheme where the relationship between odorants and ORs is not one-to-one but rather multiple-to-multiple.In other words, each odorant is recognized as a combination of activated ORs.This mechanism allows each species to detect a much larger number of odors than the actual count of OR genes encoded in the species' genome.The OR gene family is known for containing numerous pseudogenes.For instance, in humans and African elephants, the number of OR pseudogenes exceeds that of functional genes, with ∼440 pseudogenes in humans and ∼2,200 in African elephants.Additionally, frequent gene gains and losses characterize this gene family (Niimura and Nei 2007).OR genes are expressed in the main olfactory epithelium (MOE) within the nasal cavity and were first identified in rats in 1991 (Buck and Axel 1991).
Most mammals possess a secondary olfactory organ known as the vomeronasal organ (VNO), situated between the nasal and oral cavities.Initially considered specialized for pheromone detection, the VNO is now believed to share some functions with the MOE (Liberles 2014).The VNO and MOE are distinctly separate at the molecular level.Sensory neurons in the apical and basal regions of the VNO express vomeronasal receptors type 1 and type 2 (V1Rs and V2Rs), responsible for detecting pheromones from volatile molecules like sulfated steroids and watersoluble peptides, respectively (Niimura et al. 2020).The number of both V1R and V2R genes varies significantly among mammalian species (Young and Trask 2007;Young et al. 2010).The absence of the VNO in adult hominoids (humans and apes) and Old World monkeys correlates with very few or no functional V1R/V2R genes in their genomes (Table 1).It has been suggested that the number of intact V1R genes relates to the morphological complexity of the VNO (Takami 2002;Grus et al. 2005).Functional V2R genes are sparsely distributed in the phylogeny of mammals.Currently, intact V2R genes have been observed exclusively in rodents, strepsirrhines (a primate suborder including lemurs and lorises), opossums, and the platypus (Young and Trask 2007;Dong et al. 2012;Hohenbrink et al. 2013).Notably, while dogs and cows possess functional VNO and V1R genes, they lack functional V2R genes.
Taste receptors are expressed in the taste buds of the tongue.Among the five basic tastes, sweet, umami, and bitter tastants are detected by two multigene families of GPCRs, known as taste receptors type 1 and 2 (T1Rs and T2Rs; Chandrashekar et al. 2006).Generally, mammalian  The table includes the numbers of intact genes, along with the total of truncated genes and pseudogenes in parentheses.The numbers of OR genes are derived from Niimura et al. (2014Niimura et al. ( , 2018)), V2R and T1R genes are from Nei et al. (2008), V1R genes are from Nei et al. (2008) and Young et al. (2010), and T2R genes are from Hayakawa et al. (2014).Niimura et al. • https://doi.org/10.1093/molbev/msae071MBE species possess three T1R genes: T1R1, T1R2, and T1R3 (Table 1).T1Rs function as heterodimers; T1R2 + T1R3 and T1R1 + T1R3 form receptors for sweet and umami tastants, respectively.In contrast, the T2R genes form a relatively large multigene family with up to ∼40 member genes in mammals, indicating the importance of detecting bitter substances, often harmful to animals.Interestingly, T1R genes exhibit sequence similarities to V2R genes, while T2R genes share sequence similarities with V1R genes.Therefore, vomeronasal and taste receptor genes share a common evolutionary origin.V1R and T2R genes lack introns, resembling OR genes, whereas V2R and T1R genes are encoded by 6 to 7 exons and possess long N-terminal tails.
There has been a long-standing debate regarding how different senses interact with each other.It has been proposed that increased acuity in one modality of senses can lead to the decline of another.For instance, studies by Keesey et al. (2019) and Stöckl et al. (2016) illustrated a trade-off between vision and olfaction in Drosophila and Lepidopteran insects, respectively.This trade-off is likely due to the substantial energy costs needed to maintain neural systems for sensation within a limited energy budget (Niven and Laughlin 2008).In the case of mammals, a similar trade-off between vision and olfaction has been suggested in primates based on comparisons of brain structure (Barton et al. 1995;Barton and Harvey 2000).Gilad et al. (2004) argued that the loss of OR genes coincided with the acquisition of full trichromatic vision.They examined the fraction of pseudogenes among 100 randomly selected OR gene sequences from each of 19 primate species.However, subsequent studies by Matsui et al. (2010) and Niimura et al. (2018) revealed that the acquisition of full trichromatic vision and the loss of OR genes are not directly linked when examining the entire repertoires of OR genes identified from whole genome sequences.Instead, Niimura et al. (2018) demonstrated that the rate of OR gene losses accelerated during primate evolution in two specific instances: (i) at the ancestral branch of haplorhines, where significant changes occurred in eye and nose morphology, and (ii) at the ancestral branch of leaf-eating colobines, where the diet shifted from mainly consuming fruit (frugivory) to predominantly eating leaves (folivory).
Cetaceans, which include toothed whales (Odontoceti) and baleen whales (Mysticeti), present another instance of trade-off between different sensory modalities.These marine mammals have undergone a significant reduction in olfactory and taste receptor genes, possibly due to their transition from a terrestrial to an aquatic environment (Feng et al. 2014;Zhu et al. 2014;Kishida et al. 2015;Kishida 2021).Odontocetes have completely lost their olfactory nervous systems, opting instead for the development of an echolocation system.This system involves emitting clicking sounds and measuring the time lapse between the emitted sounds and their echoes to create three-dimensional images of surrounding objects.In contrast, mysticetes have a considerably reduced but fully functional olfactory system (Kishida 2021).As the olfactory systems degenerate, odontocetes have only 10 to 20 intact OR genes, while mysticetes possess a larger repertoire of intact OR genes (50 to 100) compared with odontocetes (Liu et al. 2019;Kishida 2021).The loss of chemosensation in cetacean evolution is reported to have occurred gradually through multiple steps (Kishida et al. 2015).Kishida and Thewissen (2012) proposed that the diminished chemosensation in odontocetes was not directly linked to the adoption of echolocation.However, a more recent study by Springer and Gatesy (2017) suggests a trade-off between the two sensory systems.It is worth noting that pinnipeds and sirenians, which have independently adapted to aquatic life apart from cetaceans, also exhibit reduced OR genes compared with their terrestrial relatives (Beichman et al. 2019;Liu et al. 2019).
Bats have also developed a sophisticated echolocation system independent of odontocetes.Studies by Hayden et al. (2010) and Hayden et al. (2014) found no evidence of a trade-off between the development of echolocation and the loss of olfaction when examining the OR gene repertoires of various bat species.However, there is a proposal that a sensory trade-off between vision and echolocation has occurred in bats at the genetic level (Zhao et al. 2009;Shen et al. 2013;Gutierrez et al. 2018;Wu et al. 2018).
While the association between different sensory modalities in evolutionary processes has garnered scientific attention as mentioned above, there has been relatively less exploration of evolutionary patterns among different modalities of chemical senses, olfaction, pheromone detection, and taste.Did a trade-off between the sense of smell and taste occur during evolution?Alternatively, if there is a development or decline in the sense of smell within a specific lineage, does this correspondingly influence the development or regression of the sense of taste?The objective of this study is to investigate the interplay among different chemical senses by comparing the evolutionary dynamics among families of chemosensory receptor genes responsible for detecting odors, pheromones, and tastants.
Table 1 highlights that in rodents, the quantities of V1R and V2R genes are notably higher compared with those in other orders, while the counts of OR and T2R genes are slightly larger or similar to those in other mammalian orders.The order Rodentia is categorized into either five suborders, Sciuromorpha, Castorimorpha, Myomorpha, Anomaluromorpha, and Hystricomorpha (Carleton and Musser 2005), or three suborders, Sciuromorpha, Supramyomorpha (which includes Castorimorpha, Myomorpha, and Anomaluromorpha), and Hystricomorpha (D'Elía et al. 2019).In both systems of classification, Hystricomorpha stands out as the most morphologically and ecologically diverse among these suborders (as described below).The array of chemosensory receptor genes present in each species' genome is influenced by its habitat.Consequently, Hystricomorpha is the most suitable group of species to explore the evolutionary dynamics of these gene families.For this reason, this study has specifically focused on Hystricomorpha.
Evolution of Chemosensory Receptor Genes in Hystricomorph Rodents • https://doi.org/10.1093/molbev/msae071MBE Hystricomorph rodents are characterized as hystricomorphous, indicating an enlarged infraorbital foramen that allows the passage of the medial masseter muscle (Honeycutt 2009).Notable species within this category include porcupines, naked mole-rats, capybaras, and even domesticated animals like guinea pigs.These suborder members display significant diversity in their habitat, behavior, and physical characteristics.For instance, naked mole-rats, nearly devoid of hair, live in elaborate underground eusocial communities.Nutrias predominantly inhabit freshwater embankments, constructing nests with intricate tunnel systems with passages and chambers.Porcupines, for protection, erect sharp quills composed of modified hair.Furthermore, Hystricomorpha species vary considerably in size and weight, spanning from the tiny naked mole-rat (30 to 80 g) to the larger capybara (35 to 66 kg).Animals of the Hystricomorpha species predominantly consume a wide array of plant-based foods, including fruits, seeds, nuts, roots, bulbs, bark, shrubs, grass, and grains (Wilman et al. 2014).For instance, common gundis inhabiting rocky areas in North Africa are herbivores, exclusively consuming a variety of plants.Central American agoutis primarily feed on fruits and drupes, while grasscutters, also referred to as greater cane rats, mainly consume grass.These ecological and dietary distinctions might be partly explained by differences in their chemical senses.
In this study, we aimed to explore the relationship between various chemical sensory modalities.We identified the almost complete repertoires of 5 GPCR gene families associated with chemical senses-OR, V1R, V2R, T1R, and T2R genes-from genome assemblies of 17 Hystricomorpha species and examined their evolutionary dynamics.Among these families, OR, V1R, and T2R genes share common structural features, having seven alphahelical transmembrane (TM) regions without additional domains, and their coding sequences (CDSs) are intronless and ∼1 kb long.Due to the relatively short CDSs, nearly complete gene repertoires could be identified in each genome.On the other hand, V2R and T1R genes are split into multiple exons, causing their CDSs to often appear truncated in available genome assemblies, making accurate retrieval of the full-length CDSs challenging.Consequently, we focused on a specific exon, "exon 3," for these genes, rather than the entire CDSs, to accurately estimate the number of encoded genes in a genome (Francia et al. 2015;Niimura et al. 2021).Exon 3 of a V2R/T1R gene encodes the majority of a ligand-binding domain and spans ∼810 bp, a length comparable with OR/V1R/T2R genes.Furthermore, we conducted new sequencing of the whole genome of the grasscutter to enhance the reliability of our findings in identifying these genes.By categorizing these genes into orthologous gene groups (OGGs), we accurately estimated the occurrences of gene gains and losses for each gene family in the evolution of Hystricomorpha.Through these analyses, we discovered that the OR, V1R, V2R, and T2R gene families had expanded and contracted in synchrony during their evolutionary process.

Hystricomorph Genome Analysis
We analyzed the genome sequences of 17 species from 14 families within the suborder Hystricomorpha and used the mouse (Mus musculus) as the outgroup (supplementary table S1, Supplementary Material online).To assess the quality of the genome assemblies, we initially conducted Benchmarking Universal Single-Copy Orthologs (BUSCO) (Manni et al. 2021) and Core Eukaryotic Genes Mapping Approach (CEGMA) (Parra et al. 2007) analyses for all genomes.As expected, the mouse genome assembly exhibited the highest quality, with 96.4% complete orthologous genes from the Glires database of OrthoDBv10 (BUSCO) and 96.6% complete core vertebrate genes (CVGs) (CEGMA; supplementary fig.S1 and supplementary tables S2 and S3, Supplementary Material online).In contrast, the currently available grasscutter genome in the GenBank database (ThrSwi_v1_BIUU) displayed the lowest quality, with <60% of complete orthologous genes (BUSCO) and <50% of CVGs (CEGMA).
In order to enhance the quality of our genome data, we conducted a complete genome sequencing of the grasscutter, resulting in a new genome assembly labeled as ThrSwi_NIG_v1.The quality of this newly assembled grasscutter genome significantly surpassed that of ThrSwi_v1_BIUU.According to the BUSCO analysis, we identified 94.1% of complete orthologs and 93.3% of complete single-copy orthologs from the Glires data set, with the latter figure surpassing that of the mouse (supplementary fig.S1, Supplementary Material online).In the CEGMA analysis, 95.3% of the CVGs were identified in this assembly (supplementary table S3, Supplementary Material online).The size of the ThrSwi_NIG_v1 assembly stood at 2.18 Gb, closely resembling the estimated grasscutter genome size of 2.2 Gb.In contrast, the prior assembly, ThrSwi_v1_BIUU, measured significantly larger at 2.66 Gb, exceeding the estimated size.Consistent with this observation, the number of orthologs per core gene in the CEGMA analyses notably reduced from 1.62 (ThrSwi_v1_BIUU) to 1.01 (ThrSwi_NIG_v1).
We proceeded by constructing a phylogenetic tree featuring the 17 Hystricomorpha species alongside the mouse as an outgroup.This was achieved using a concatenated sequence of 2,520 one-to-one orthologous genes, acquired through the BUSCO analysis (supplementary table S4, Supplementary Material online).Accurate phylogenetic relationships are crucial to estimate the occurrences of gene gains and losses in the evolution of Hystricomorpha (as detailed below).The tree topology generated via the maximum likelihood (ML) method closely resembled that of a prior study (Lacher et al. 2016), exhibiting 100% bootstrap support for all nodes (supplementary fig.S2, Supplementary Material online).Through divergence analysis, we were able to pinpoint the divergence among these rodents on an evolutionary time scale (Fig. 1).

Chemosensory Receptor Genes in Hystricomorpha
To investigate the evolutionary patterns of chemosensory receptor gene families within Hystricomorpha, we identified OR, V1R, V2R, T1R, and T2R genes from the genome sequences of 17 Hystricomorpha species (Fig. 2; supplementary data S1 to S10, Supplementary Material online).For OR, V1R, and T2R genes, encoded by a single exon, we categorized the identified genes into three groups: intact genes, truncated genes, and pseudogenes.An intact gene is presumed to be functional, whereas a pseudogene contains disruptive elements such as stop codons, frameshifts, or gaps in conserved regions.A truncated gene represents a partially intact sequence found at the end of a contig, potentially becoming intact if the quality of the genome sequence is improved (Niimura and Nei 2007).Consequently, the number of intact genes depicted in Fig. 2 represents the lower bound of the number of functional genes encoded in each genome.
A V2R/T1R gene is typically comprised of 6 to 7 exons.Mouse V2R genes, with an average length of ∼25.3 kb (including introns), are more than 25 times longer than OR/V1R/T2R genes.Hence, the likelihood of a V2R/T1R gene being truncated in a genome assembly consisting of short contigs is substantially higher compared with an OR/V1R/T2R gene.Notably, the estimated probability of truncation for V2R genes is over 0.45 in 8 out of 17 genomes analyzed in this study (see Materials and Methods; supplementary table S1, Supplementary Material online).This suggests that for a significant portion of V2R genes from examined genome assemblies in this study, it is improbable to retrieve the entire CDSs.
Hence, instead of the entire CDSs, we focused on identifying the "exon 3" sequences of V2R/T1R genes (see Materials and Methods and Discussion for details).Consequently, we successfully pinpointed three intact exon 3 sequences of T1R genes in each species, excluding the common gundi, which lacks a T1R1 gene (Fig. 2; supplementary figs.S3 and S4, Supplementary Material online).We did not encounter any T1R pseudogene sequences.These findings demonstrate the stability in the number of T1R genes throughout hystricomorph evolution, rendering the T1R gene family less useful for exploring interactions among different gene families.Therefore, our subsequent analyses will focus on the remaining four gene families: OR, V1R, V2R, and T2R genes.
The number of intact genes varied across species for each of the four gene families.Across the 17 Hystricomorpha species, the mean numbers for intact OR, V1R, V2R, and T2R genes were 800, 73, 45, and 30, respectively (Table 2).The coefficients of variation, calculated by dividing the standard deviation by the mean, for intact OR, V1R, V2R, and T2R genes were 0.42, 0.45, 0.61, and 0.27, respectively.These figures indicate that the number of T2R genes displays less variability compared with the other gene families.
The numbers of intact genes for V1R and T2R exhibited a strong positive correlation (Spearman correlation coefficient r S = 0.91; supplementary fig.S5, Supplementary Material online, top).Similarly, those for V1R and V2R genes (r S = 0.64) and for V2R and T2R gene (r S = 0.59) also demonstrated positive correlations.However, evolutionarily closely related species are expected to have similar gene counts due to shared ancestry.To address this, we Fig. 1.Phylogenetic tree of 17 Hystricomorpha species along with the mouse.The calibrated phylogenetic tree of Hystricomorpha with M. musculus as an outgroup was generated through Bayesian analysis in BEAST using a data set comprising 2,520 genes.Geological periods are depicted at the top.Each node displays the estimated divergence time (in MYA), with a bar representing the 95% credibility intervals for the respective node.
The majority of Hystricomorpha species possesses between 600 and 900 intact OR genes.Interestingly, Central American agoutis stand out with an exceptionally  Evolution of Chemosensory Receptor Genes in Hystricomorph Rodents • https://doi.org/10.1093/molbev/msae071MBE high count of intact OR genes (1,914), rivaling the numbers found in African elephants, which have the largest OR gene repertoire ever studied (Niimura et al. 2014).Conversely, the number of intact OR genes in common gundis was notably small (359).Central American agoutis possess the largest collection of intact V2R genes (93) and T2R genes (54) as well.As far as our knowledge goes, this count of T2R genes is the highest ever documented among all mammalian species (Hayakawa et al. 2014;Li and Zhang 2014;Shang et al. 2017) (see Discussion).In contrast, naked mole-rats and Damaraland mole-rats exhibit the smallest T2R gene repertoires (20).Mole-rats possess unique ecological traits, dwelling underground and living in eusocial communities, seldom surfacing and primarily feeding on bulbs and tubers (Jarvis 1981;Jarvis and Bennett 1993;Oosthuizen et al. 2003;Buffenstein et al. 2022).Among the species surveyed, naked mole-rats also exhibit the fewest intact V1R genes (20).
Upon comparing the previous and newly sequenced grasscutter genome assemblies, it was evident that the number of truncated OR and V1R genes notably decreased, while the total number of genes did not significantly change (see supplementary fig.S6, Supplementary Material online).This outcome once more indicates a substantial improvement in the quality of the grasscutter genome assembly.

Gains and Losses of Chemosensory Receptor Genes in the Hystricomorpha Evolution
In order to explore the evolutionary changes within the Hystricomorpha involving OR, V1R, V2R, and T2R genes, we identified OGGs for each gene family.An OGG represents a set of genes that originated from a single gene in the most recent common ancestor (MRCA) of the species under study.Consequently, we classified 13,597 OR genes, 1,237 V1R genes, 761 V2R genes, and 517 T2R genes identified from 17 Hystricomorpha species into 654, 33, 8, and 21 OGGs, respectively (Table 2; supplementary data S11 to S14, Supplementary Material online).
The quantity of OR genes within each OGG exhibited significant variability among the OGGs (Fig. 3A), a pattern previously observed in placental mammals (Niimura et al. 2014).The ancestral OR gene of OGG2-27 in the MRCA of Hystricomorpha gave rise to 363 descendant genes across 17 Hystricomorpha species (Fig. 3B and C; supplementary data S11, Supplementary Material online).OGG2-27 is the most largely expanded OGG in the Central American agouti, with 101 OR genes from this species.Intriguingly, only five mouse OR genes (Olfr1096 to 1100) and three human OR genes (OR8H1, OR8H2, and OR8H3) belong to OGG2-27 (Niimura et al. 2014), indicating a specific expansion of OR genes within this OGG occurred in the hystricomorph lineage.
Figure 4 (V2R) displays the phylogenetic trees comprising all the exon 3 sequences of 761 V2R genes found in Hystricomorpha, along with representative mouse V2R genes (supplementary figs.S11 and S12, Supplementary Material online).Notably, there is a hystricomorph-specific OGG (OGG-V2R-AH1) that demonstrates extensive expansion, housing 608 V2R genes from the 17 Hystricomorpha species (supplementary data S13, Supplementary Material online).In stark contrast, OGG-V2R-C displays considerable evolutionary conservation, manifesting one-to-one orthologous relationships across all 17 Hystricomorpha species.This particular OGG is orthologous to the subfamily C of mouse V2R genes, which represents the most basal group and is coexpressed with V2R genes belonging to subfamilies A, B, and D (Silvotti et al. 2011;Brykczynska et al. 2013;Francia et al., 2015;Tirindelli 2021).
Figure 4 (T2R) shows the phylogenetic tree presenting all 517 T2R genes alongside human and mouse T2R genes (supplementary figs.S13 and S14, Supplementary Material online).Among the identified OGGs for T2R, the largest is OGG-TAS2R1, containing 112 genes, and it is the only OGG with over 100 genes (supplementary data S14, Supplementary Material online).Among the T2R genes of Central American agoutis, 24 belong to this OGG.This particular OGG is orthologous to human TAS2R1 gene.Previous research also noted a specific expansion of TAS2R1 orthologs in the hystricomorph lineage (Hayakawa et al. 2014).
The variation in the number of genes across each OGG was more pronounced for V1R and V2R compared with OR and T2R (Table 2, supplementary fig.S15, Supplementary Material online).The standard deviation of the number of V2R genes in each OGG (208) was notably larger than that of V1R (48.1;P < 10 −8 , F-test), OR (31.2;P < 10 −15 ), and T2R genes (26.1; P < 10 −11 ).Furthermore, the standard deviation of the number of V1R genes was significantly larger than that of OR (P < 10 −4 , F-test) and T2R genes (P = 0.006).These findings indicate that V1R and V2R genes experienced more dynamic evolutionary changes compared with OR and T2R genes.On the contrary, the number of T2R genes within each OGG exhibited considerably less variation, suggesting the relative stability of T2R genes during the course of Hystricomorpha evolution.Specifically, 12 out of the 21 OGGs included 14 to 17 T2R genes from the 17 Hystricomorpha species.
Next, we estimated the number of gains and losses for OR, V1R, V2R, and T2R genes for each OGG in accordance Niimura et al. • https://doi.org/10.1093/molbev/msae071MBE Fig. 4. Phylogenetic trees for V1R, V2R, and T2R genes.The V1R genes were classified into two groups: V1R-1 containing clades A, B, C, H, I, J, and K and V1R-2 containing clades D, E, F, and G. V1R-1 and V1R-2 trees encompassed 613 and 624 V1R genes from 17 Hystricomorpha species, respectively, as well as 36 and 38 mouse V1R genes, respectively.The V2R tree was constructed using the exon 3 sequences of 761 hystricomorph V2R genes alongside 28 representative mouse genes.The T2R tree employed 517 hystricomorph T2R genes with 24 human and 36 mouse genes.The OGGs, which contained >100 genes from 17 Hystricomorpha species, are depicted by their OGG names along with the number of genes they include (in parentheses).For example, "1-1" in the V1R-1 tree denotes OGG-V1R-1-1, the largest OGG for V1R, housing 191 hystricomorph genes.In both V1R-1 and V1R-2 trees, the clade names for mouse genes align with those provided by Rodriguez et al. (2000) and Grus et al. (2005).In the V2R tree, subfamily names for mouse V2R genes are labeled according to Silvotti et al. (2011).In the T2R tree, gene names for human T2R genes are displayed (e.g."14" represents TAS2R14).Color codes are outlined at the bottom, and the scale bar indicates the number of amino acid substitutions per site.
Evolution of Chemosensory Receptor Genes in Hystricomorph Rodents • https://doi.org/10.1093/molbev/msae071MBE with the Hystricomorpha phylogeny (Fig. 1), utilizing the reconciled tree method.The total count of gene gains and losses across all OGGs is illustrated in Fig. 5.In Fig. 5A, the chart shows a notably high frequency of gains and losses among OR genes during the course of evolution, as previously observed (Niimura and Nei 2007;Niimura et al. 2014).The findings also imply that although the number of OR genes varies considerably in extant Hystricomorpha species, those in the ancestral nodes appear to be relatively consistent (falling within the range of 700 to 900).Utilizing these figures, we computed the rates of gene gains/losses per gene during the evolution

MBE
of Hystricomorpha for each gene family.The analysis revealed that both the rates of gene gains and losses were highest for V1R/V2R genes, followed by OR genes, with T2R genes exhibiting the lowest rates (Table 2).
We analyzed the correlation between the number of gene gains or losses along each branch of Hystricomorpha evolution among OR, V1R, V2R, and T2R genes (Fig. 6).The results demonstrated a significant positive correlation in the number of gains across these four gene families in most instances (P < 5%).Interestingly, there was a more stringent positive correlation in the number of losses (P < 0.1%).Particularly strong correlations were observed between V1R and V2R genes for both gene gains and losses, indicating notable associations (P < 0.1%).

Discussion
In this study, we investigated the evolutionary dynamics among 4 chemosensory receptor genes-OR, V1R, V2R, and T2R-responsible for detecting odorants, pheromones, and tastants in 17 Hystricomorpha species.The findings reveal that (i) V1R/V2R genes exhibit the fastest gene turnover, followed by OR genes, while T2R genes display the slowest turnover.Additionally, (ii) gains or losses occurred synchronously among OR, V1R, V2R, and T2R genes during Hystricomorpha evolution.This synchronous fluctuation suggests that these four gene families expanded or contracted simultaneously throughout the evolutionary process, indicating no trade-off among different chemical sensing modalities.
Observation (i) can be elucidated by noting the functional distinctions among the four gene families.Ligands of V1Rs and V2Rs are species-specific pheromones.ORs are receptive to odorants that depend on distinct living environments in each species.On the other hand, T2Rs are attuned to detecting bitter taste compounds, a common signal of potential toxicity in most animals.Hence, it is plausible that V1R/V2R genes undergo the most rapid evolution, OR genes display intermediate evolutionary rates, and T2R genes are the most conserved across species.Similar contrasting evolutionary patterns of OR and V1R/ V2R genes have been observed in a broader spectrum of vertebrate species (Grus and Zhang 2008).While previous studies have reported considerable variability in the number of T2R genes among mammals (Shi and Zhang 2006;Hayakawa et al. 2014;Li and Zhang 2014), our study revealed that the degree of variability is greater for OR, V1R, and V2R genes than for T2R genes, both across species and among OGGs (Table 2).
It should be noted that the current number of intact genes within a species is a result of both gene gains and losses over time.If an equal number of gains and losses occur, they counteract each other, resulting in no net change in the gene count.Since the frequency of gene gains and losses is extremely high for OR genes (Niimura and Nei 2007;Niimura et al. 2014), focusing on the numbers of gene gains and losses during evolution is more relevant than solely comparing the current gene numbers among existing species.Therefore, to explore the correlation between different gene families, we identified OGGs for each gene family and calculated the count of gene gains and losses separately for each branch in the Hystricomorpha phylogeny.
We should also note that the number of intact genes in each species cannot be considered as independent variables when examining the correlation between different gene families.This is due to the shared evolutionary history among any two species.Thus, removing the phylogenetic Evolution of Chemosensory Receptor Genes in Hystricomorph Rodents • https://doi.org/10.1093/molbev/msae071MBE dependence is crucial to assess the significance of correlations, as demonstrated in supplementary fig.S5, Supplementary Material online (bottom).On the contrary, in the analyses presented in Fig. 6, such removal is unnecessary as the number of gene gains/losses occurring at each branch was assessed separately.Consequently, the number of gene gains/losses is considered an independent variable.In this study, the significance of the correlation among different gene families was detectable because we separately calculated the number of gene gains/losses in each branch using OGGs.
In this study, we focused on examining exon 3 sequences of V2R/T1R genes rather than analyzing the entire CDSs.The rationale for utilizing exon 3 sequences as a proxy of the entire CDS is in the following: (i) exon 3 predominantly encompasses the ligand-binding domain of a V2R.(ii) The phylogenetic analysis of exon 3 sequences exhibited a remarkably similar topology to that of full-length V2R genes (Francia et al. 2015;Niimura et al. 2021).(iii) Typically, exon 3 of a V2R/T1R gene encodes an average of ∼270 amino acids, a length comparable with that of an entire OR/V1R/T2R gene.
As a result, we found three intact exon 3 sequences of T1R genes and no pseudogenes in each species, with an exception of the common gundi which lacked the T1R1 gene responsible for the umami receptor (Fig. 2; supplementary figs.S3 and S4, Supplementary Material online).The fact that precisely three intact exon 3 sequences of T1R genes were identified in most of the examined species supports the notion that the exon 3 sequence can serve as a proxy for the fulllength sequence.However, it is important to note that a single exon cannot fully represent the entirety of a gene with six or seven exons.Therefore, there is a limitation to this analysis.
We investigated the common gundi genome to detect the presence of the exons other than exon 3, but apart from an incomplete exon 6 sequence, we did not identify any of them.To examine whether the absence of the T1R1 gene in the common gundi genome is due to an incomplete genome assembly, we investigated the presence of adjacent genes (Klhl21, Zbtb48, and Nol9) to the T1R1 gene (supplementary fig.S16, Supplementary Material online).We found all these flanking genes in the genome sequences of all examined species, and they mostly shared the same scaffold as T1R1.However, in the common gundi, although all the flanking genes were on a single scaffold (PVKB01001179.1), the T1R1 gene itself was not identified (see supplementary information and supplementary tables S5 and S6, Supplementary Material online).Notably, there are no gaps in the genomic region containing these franking genes in the scaffold PVKB01001179.1.Therefore, it is possible that the T1R1 gene has undergone pseudogenization in the common gundi genome, although we cannot rule out the possibility of assembly errors.In this regard, it is noteworthy that certain mammals, such as carnivores, cetaceans, and bats, have lost one or more T1R genes in response to changes in diet, feeding habitats, or environmental conditions (Jiang et al. 2012;Zhao et al. 2012; Antinucci and Risso 2017; Wolsan and Sato 2022).
We found that the Central American agouti possesses a surprisingly large number of OR genes, comparable with that of the African elephant (Niimura et al. 2014).This observation does not appear to stem from an assembly error, because (i) the size of the Central American agouti's genome assembly stands at 3.01 Gb (supplementary table S1, Supplementary Material online), falling within the range of other Hystricomorpha species' genome sizes, and (ii) the amino acid sequence diversity among all identified OR genes in the Central American agouti closely aligns with that of the African elephant (supplementary fig.S17, Supplementary Material online).If the extensive presence of OR genes in the Central American agouti genome resulted from an erroneous assembly, it would imply an abundance of similar OR genes.Our analyses showed that 1,914 Central American agouti OR genes and 1,948 African elephant OR genes are divided into 1,160 and 1,080 groups, respectively, based on a 90% amino acid sequence identity threshold (supplementary fig.S17, Supplementary Material online).Thus, the OR genes in the Central American agouti exhibit comparable diversity with those in the African elephant, suggesting that the Central American agouti's genome does contain a large repertoire of OR genes.Interestingly, the Central American agouti also hosts the largest numbers of V2R (93) and T2R (54) genes among the examined Hystricomorpha species.This T2R count is even the highest among any mammals reported (Hayakawa et al. 2014;Li and Zhang 2014;Shang et al. 2017).
Why does the Central American agouti possess such extensive OR, V2R, and T2R gene repertoires?There are several reasons why this may be advantageous.Central American agoutis have a habit to bury nuts in the ground, storing them for later consumption (Smythe 1978).This behavior may require a heightened sense of smell and taste to locate hidden food sources and identify potential dangers.Additionally, Central American agoutis are frugivores, with ∼40% of their diet consisting of fruits (Wilman et al. 2014), making them crucial seed dispersers.In this regard, it is worth noting that frugivorous primates tend to have a larger number of OR genes compared with their nonfrugivorous relatives (Niimura et al. 2018).Moreover, Hayden et al. (2014) reported that specific subsets of OR genes (OR1/3/7 and OR2/13) have expanded in frugivorous bats.However, OGG2-27, the most largely expanded OGG in the Central American agouti, does not belong to either OR1/3/7 or OR2/13.Exploring whether the same OGGs expanded or contracted in parallel across two or more distinct lineages due to similar dietary shifts would be intriguing.
It is also possible that the extensive gene repertoires of the Central American agouti occurred neutrally, driven by "genomic drift" (Nei et al. 2008).This species' genome might be more tolerant toward gene duplication compared with other species.The analysis in Fig. 5 implies that the number of OR genes in the ancestral nodes of the Hystricomorpha phylogeny remains relatively stable across evolution, despite the significant variation in the Niimura et al. • https://doi.org/10.1093/molbev/msae071MBE current number of OR genes among extant species.This trend may indicate an optimal number of genes for Hystricomorpha species.In such a scenario, even if the number fluctuates neutrally due to random drift, it would eventually regress to this optimal value.Currently, the reason behind the presence of extensive OR, V2R, and T2R gene repertoires in the Central American agouti remains unclear.Performing comparative genomic analyses of its close relatives might offer further insights.
Among the examined species, Patagonian maras have the largest repertoires of intact V1R genes (147) and pseudogenes (417).This observation may be attributed to the unique mating system and social structure of this species.Unlike most mammals, Patagonian maras are monogamous, forming strong, lifelong pair bonds (Kessler et al. 2009).The male follows and guards the female wherever she goes, marking her as his territory by urinating directly on her.
In this research, we assembled a high-quality genome of the grasscutter, achieving over 90% completeness based on the expected BUSCO genes.Despite previous studies attempting to estimate the divergence time among distinct Hystricomorpha species, a consensus remains elusive.Our research estimated the divergence between Hystricomorpha and Myomorpha to be around 61.4 million years ago (MYA).This estimation aligns closely with some existing studies (Hallstrom and Janke 2010;Jameson et al. 2011;dos Reis et al. 2012;Upham and Patterson 2012;Ge et al. 2013;Pozzi et al. 2014;Shao et al. 2015;Laurin et al. 2022) but significantly differs from others (16 MYA in Ge et al. (2013) and76.5 MYA in Fabre et al. (2012)).Additionally, the divergence estimate between guinea pigs and Montane guinea pigs was determined to be 0.107 MYA, consistent with Upham and Patterson (2012) but divergent from various other studies (Fritz et al. 2009;Fabre et al. 2012;Upham and Patterson 2015;Álvarez et al. 2017).Differences in these estimates are often related to the quantity and type of genes (nuclear or mitochondrial) employed.Notably, our study utilized the largest number of nuclear genes to date, thus yielding more accurate estimates of divergence times.
Hystricomorph rodents have a wide geographic distribution, populating every continent except Antarctica.The species belonging to Hystricomorpha exhibit substantial diversity in body size, ecological niches, and morphological traits.Exploring the genes responsible for chemosensory receptors in these species could offer insights into the relationship between genes and the variation in these traits.In this research, we have produced an updated version of the grasscutter genome (ThrSwi_NIG_v1).The grasscutter, found in sub-Saharan Africa, weighs between 3.5 and 4.5 kg and is a common source of meat consumed by humans, particularly in Western Africa (Adenyo et al. 2017;Dery et al. 2020).The molecular data obtained on chemosensory receptor genes, along with the whole genome sequence, will be instrumental in conducting further genetic analyses of grasscutters and other species within the hystricomorph group.
In summary, our analyses showed that the four gene families involved in the chemical senses (OR, V1R, V2R, and T2R genes) expanded or contracted synchronously during the evolution of Hystricomorpha.This means that when the number of genes in one family increased or decreased, the others tended to do the same.In other words, there was not a trade-off between different chemical senses.This coherence might be explained by differences in genome tolerance to gene duplications across species, as mentioned earlier.However, the precise evolutionary mechanisms responsible for this synchronicity across the gene families related to chemosensory receptors remain unclear, and further studies are needed to gain a better understanding of this phenomenon.

Evaluation of the Genome Assemblies
The genome assemblies of 17 Hystricomorpha species and mice were downloaded from the National Center for Biotechnology Information website (https://www.ncbi.nlm.nih.gov/assembly) (supplementary table S1, Supplementary Material online).The completeness of the genome assemblies was evaluated using BUSCO v5.0 (Manni et al. 2021) and CEGMA v2.5 (Parra et al. 2007).For the BUSCO analysis, the Glires data set from OrthoDBv10 (creation date: 2020 August 5) (Kriventseva et al. 2019) was used because it is the closest available lineage database to Hystricomorpha.BUSCO v5.0 uses MetaEuk (Levy Karin et al. 2020) and HMMER 3.1+ (Eddy 2011) to verify assembly completeness and gene prediction.The rest of the settings were default in the BUSCO analysis.For the CEGMA analysis, 233 CVGs that are shared as one-to-one orthologs by all vertebrate genomes were used (Hara et al. 2015).

Grasscutter Genome Analysis
Genomic DNA from a male grasscutter was extracted from the muscle tissue using the following methods.Muscle tissue was lysed in lysis buffer (KURABO Genomic DNA Extraction Solution, KURABO Industries Ltd., Osaka, Japan) supplemented with proteinase K at a final concentration of 0.2 mg/mL.The tissue was lysed at 55°C for overnight, followed by an RNase A treatment for 1 h at 37°C at 0.15 ng/mL concentration.Genomic DNA was extracted using phenol/chloroform followed by chloroform.Genomic DNA was precipitated with isopropanol, washed with 70% ethanol, and dissolved in TE buffer.
For the whole genomic sequencing, one paired-end library, PE600, with 638 bp insert sizes on average, and four mate-pair libraries with different insert sizes (MP3000, 3,147 bp; MP6000, 6,157 bp; MP10000, 9,575 bp; MP15000, 13,457 bp) were generated from the genomic DNA.The five libraries were sequenced on an Illumina HiSeq 2500 platform.The grasscutter genome size was estimated as 2.2 Gb using k-mer distribution analysis in GenomeScope 1.0 (Vurture et al. 2017).The statistics of the new grasscutter genome assembly (ThrSwi_NIG_v1) were calculated using QUAST (Gurevich et al. 2013) with a sequence cutoff of 1,000 bp: The assembly was 2.18 Gb long with 20,779 Evolution of Chemosensory Receptor Genes in Hystricomorph Rodents • https://doi.org/10.1093/molbev/msae071MBE scaffolds, a scaffold N50 length of 20,890,746, and a GC content of 42.5%.Raw sequencing reads and assembled genomes are available in the DDBJ/GenBank/EMBL with BioProject accession number PRJDB10995.Moreover, we developed a web browser for the new grasscutter genome to ensure that the genomic information can be more widely and effectively used by many researchers (https:// grasscutter.nig.ac.jp/).

ML Phylogenetic Analysis
The amino acid output of BUSCO analysis, which uses orthologous gene sequences in its database, was used for phylogenetic analysis.We selected the amino acid sequences of genes that were complete, single-copy, and present in all species examined in our study, identifying 2,520 sequences that met this criterion (supplementary table S4, Supplementary Material online).For each protein, we performed a local alignment using MAFFT v7.490 (Katoh and Standley 2013) with the "--maxiterate 1,000 --localpair" command.Low-quality alignments were filtered out using trimAI (Capella- Gutierrez et al. 2009) with the "-automated1" command.After alignment and quality trimming, we concatenated all sequences together.The resultant aligned FASTA file was then used for ML tree construction in raxmlGUI 2.0 (Edler et al. 2021).To determine the appropriate model, we used ModelTest-NG (Darriba et al. 2020) and found that JTT + I + G4 + F was the best fit.We then generated a RAxML tree with 1,000 bootstraps (Stamatakis 2014).The resulting tree was visualized and edited using FigTree (http://tree.bio.ed.ac.uk/software/ figtree/).

Divergence Dating
This analysis used aligned and quality-trimmed protein sequences obtained from BUSCO.The best fit evolutionary model for each protein sequence was selected using PartitionFinder2 (Lanfear et al. 2017) with "rclusterf" search strategy (Lanfear et al. 2014) and "--raxml" (Stamatakis 2014) command.PartitionFinder2 produced 14 different models for the protein sequences (supplementary table S4, Supplementary Material online).AMAS was used to concatenate proteins with similar evolutionary models into an amino acid supermatrix (Borowiec 2016).Divergence dating was performed using BEAST 2.6 (Bouckaert et al. 2019) with each amino acid supermatrix treated as a partition (14 partitions in total).All site models were unlinked, while clock models and trees were linked between the partitions.Each partition was assigned a different model based on the model predicted by PartitionFinder2.The relaxed log-normal was used in the clock model (Drummond et al. 2006), and different fossil calibrations were used to constrain the trees (supplementary information, Supplementary Material online).The Birth-Death model was chosen prior to tree generation, setting the MCMC chain length at 20,000,000 generations, sampling every 5,000 generations.Analyses were checked for convergence using Tracer v.1.7.1 (Rambaut et al. 2018).Trees before convergence were discarded as burn-in, and a single consensus tree was generated using TreeAnnotator.The final tree was generated using R package strap (Bell and Lloyd 2015).Posterior probabilities were added to the tree using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).Our findings were also verified using resources from timetree.org(Kumar et al. 2017).

Identification of OR Genes
OR genes were identified in the genome sequences of the 17 Hystricomorpha species (supplementary table S1, Supplementary Material online).We identified intact genes, truncated genes, and pseudogenes of ORs using the method described in Niimura ( 2013) with a modification described in the supplementary note in Niimura et al. (2018).To refine the OR genes identified above, these genes were further filtered using four additional steps (supplementary fig.S18, Supplementary Material online).(i) Genes encoded in contigs shorter than 1 kb were discarded.(ii) When a contig was embedded within another longer contig with a >99% nucleotide sequence identity and an OR gene encoded in one contig is >99% identical in amino acid sequence to the OR gene encoded in the other contig, the OR gene encoded in a shorter contig was discarded (Niimura et al. 2018).(iii) Intact/truncated OR genes with 100% amino acid sequence identity with another intact/truncated OR gene in the same species were discarded in the following way.When two intact OR genes were 100% identical in amino acid sequences without any gaps, either of the genes was discarded.When a truncated gene was a part of an intact gene with 100% identity in the amino acid sequence without any gaps, the truncated gene was discarded.When a truncated gene was a part of another truncated gene with 100% identity in the amino acid sequence without any gaps, the shorter truncated gene was discarded.(iv) When there were two truncated genes, one of which was missing the N-terminal end and the other was missing the C-terminal, and the amino acid sequence in an overlapped region was 100% identical to each other without any gaps, then the two truncated genes were combined into one sequence and regarded as an intact gene.
Identification of V1R and T2R Genes V1R and T2R genes were identified using a method similar to that used for OR genes, with slight modifications.V1R and T2R genes show weak similarity to each other; therefore, we identified V1R and T2R genes together.First, we performed TBLASTN searches (Altschul et al. 1997) with an e-value of 1e −20 against the whole genome sequences of the 17 Hystricomorpha species using the amino acid sequences of 318 V1R genes, 8 ancV1R genes, and 158 T2Rs genes as queries.Query sequences were obtained as follows: For V1R genes, we extracted functional V1R genes from 10 mammalian species (human, mouse lemur, mouse, rat, guinea pig, dog, cat, cow, horse, and elephant) from the supplementary data of Young et al. (2010).To eliminate highly similar sequences in the same species, Niimura et al. • https://doi.org/10.1093/molbev/msae071MBE we calculated the amino acid sequence identities between all possible pairs of V1R genes in each species and classified them into groups with a threshold of 80% amino acid sequence identity.We then extracted representative genes from each group to obtain a total of 318 (5 human, 21 mouse lemur, 76 mouse, 67 rat, 47 guinea pig, 7 dog, 16 cat, 25 cow, 31 horse, and 23 elephant) V1R genes that were used as queries.The amino acid sequences of eight ancV1R genes from eight mammalian species (mouse, rat, guinea pig, dog, cat, cow, horse, and elephant) were obtained from Zhang and Nikaido (2020).The amino acid sequences of 24 human, 36 mouse, 36 rat, and 31 guinea pig T2R genes were obtained from Hayakawa et al. (2014).Additionally, 14 dog and 17 cow T2R genes downloaded from GenBank were also used as queries.
Because the query sequences are similar to one another, multiple query sequences may hit the same genomic region.We therefore extracted the "best-hit" sequence for each genomic region, which corresponds to a query showing the lowest e-value among all queries that hit a given genomic region (Niimura 2013).The best-hit sequences were classified into three categories according to the query: V1R, ancV1R, and T2R.To extract intact genes from the best-hits, the following filtering process was conducted: First, the best-hit sequences shorter than 250 amino acids were discarded.We then extended each of the remaining best-hit sequences along the DNA to extract the longest CDS, starting from the initiation codon and ending with the stop codon.If the extracted CDS was shorter than 250 amino acids, the sequence was discarded.
Subsequently, the criteria for selecting intact sequences among the candidate sequences obtained above differed between V1R and T2R genes.The methods used for the V1R genes are as follows.Because we extracted the longest CDSs using the criteria above, some sequences contained excessively longer N-terminal sequences than the known V1R genes.Therefore, we chose the most appropriate ATG codon as an initiation codon among the multiple ATG codons located in the N-terminal portion in the following manner.We constructed a multiple alignment using all remaining sequences from each species together with 76 mouse V1R genes, which were used as queries in the TBLASTN searches mentioned above.The 24th amino acid of mouse Vmn1r1 (GenBank accession number, NP_001160200.1) is G (glycine), which is relatively well conserved among all mouse V1Rs.Therefore, we designated this amino acid position in each sequence as "0" (origin).We then counted the number of amino acids before the amino acid position 0 for each sequence.We selected an ATG codon located at the most appropriate position among the multiple ATG codons as an initiation codon in the following manner.(i) If the length of the N-terminal portion was zero, the sequences were discarded.(ii) Otherwise, we chose the ATG codon that is located at the most upstream position among all ATG codons existing between position −40 and position 0. Next, we constructed multiple alignments of the remaining sequences using MAFFT (Katoh and Standley 2013) and visually inspected them.Sequences shorter than 280 amino acids were excluded.Next, we predicted the TM helical regions for each of the 76 query genes using TMHMM-2.0 (Krogh et al. 2001).The 15-amino-acid region from position 274 to position 288 ("MFVSSGYATFSPLVF") of Vmn1r1 corresponds to the 7th TM region for most of the 76 queries.Here, we call this region as the "7TM region."If the number of gaps within the 7TM region in a multiple alignment was seven or more, such sequences were eliminated.All remaining sequences were regarded as intact V1R genes because they did not contain any gaps in well-conserved regions.
Intact T2R genes were selected from the candidate sequences in a manner similar to that used for V1R genes.We chose the most appropriate ATG codon as an initiation codon among the multiple ATG codons located in the N-terminal portion in the following manner.We constructed a multiple alignment using all remaining sequences from each species, together with 24 human and 36 mouse T2R genes used as queries in the TBLASTN searches.The 16th amino acid of human TAS2R5 (GenBank accession number, NP_061853.1) is E (glutamic acid), which is relatively well conserved among all human and mouse T2Rs.Moreover, for most human and mouse T2Rs, the length of the N-terminal portion prior to this amino acid is 15.We designated this amino acid position in each sequence as "0" (origin) and counted the number of amino acids before the amino acid position 0 for each sequence.We selected ATG as the initiation codon in the following manner.(i) If the length of the N-terminal portion was zero, the sequences were discarded.(ii) Otherwise, we chose the ATG codon that is located at the most upstream position among all ATG codons existing between position −27 and position 0. Finally, we constructed multiple alignments of the remaining sequences using MAFFT and visually inspected the alignment.We eliminated sequences shorter than 280 amino acids, and all remaining sequences were regarded as intact T2R genes because they did not contain any gaps in well-conserved regions.
Next, we identified truncated genes from the nonintact best-hit sequences.The criteria used to identify V1R and T2R truncated genes were the same as those for OR genes.All the best-hit sequences for V1R and T2R gene queries, excluding intact and truncated genes, were regarded as V1R and T2R pseudogenes, respectively.
Finally, the intact genes, truncated genes, and pseudogenes of V1R and T2R identified above were further refined by the additional four steps, which were used for the filtering processes for OR genes, as described above.

Estimation of the Probability of Truncation of V2R Genes
We first calculated the mean length of 121 functional V2R genes in mice obtained in the "Identification of Exon 3 Sequences of V2R and T1R Genes" section.The coordinates of CDSs of these genes were retrieved from the file "Mus_musculus.GRCm39.104"downloaded from the Ensembl website (https://asia.ensembl.org/).The length Evolution of Chemosensory Receptor Genes in Hystricomorph Rodents • https://doi.org/10.1093/molbev/msae071MBE of each V2R gene was calculated as a distance between the positions of the initiation codon and the stop codon.The mean length L of the V2R genes was calculated to be 25.3 kb.
Using L, the probability of truncation of a V2R gene for a given genome assembly was estimated in the following way.Suppose that the genome assembly is composed of n 1 scaffolds that is longer than L and n 2 scaffold that is shorter than or equal to L. Let us consider the situation that a V2R gene is located on a scaffold longer than L (top, supplementary fig.S19, Supplementary Material online).In this case, when the center of a V2R gene is located within the region of L/2 at either end of the scaffold, the gene is truncated, and the entire CDS cannot be retrieved.On the other hand, if a V2R gene is located on a scaffold shorter than or equal to L, the gene is truncated regardless of a location of the center of the gene (bottom, supplementary fig.S19, Supplementary Material online).Therefore, the probability P of truncation of a V2R gene can be estimated as where S i is the length of i-th longest scaffold in the genome assembly and N is the entire length of the genome assembly.

Identification of Exon 3 Sequences of V2R and T1R Genes
We first downloaded Vmn2r1 paralogs in mice from the Ensembl website (https://asia.ensembl.org/),which included V2R genes, T1R genes, and non-V2R/T1R GPCR genes such as the calcium-sensing receptor (CaSR) gene.We then constructed a multiple alignment of the amino acid sequences of these genes using "Mus_musculus.GRCm39.pep.all"file by MAFFT (Katoh and Standley 2013) and extracted the exon sequences that correspond to the exon 3 of Vmn2r1 by visual inspection.We call these sequences "exon 3" regardless of the difference in exon/intron structures, though for some genes, the corresponding exon is not actually the third exon.As a result, we obtained 128 unique exon 3 sequences of 121 V2R, 3 T1R, and 4 non-V2R/T1R GPCR genes.These sequences were used as queries for TBALSTN searches (Altschul et al. 1997) against the 17 Hystricomorpha genome assemblies with an e-value of 1e −20 .We then extracted "best-hit" genomic regions to any of the 128 query sequences as explained in "Identification of V1R and T2R Genes" section.When the translated amino acid sequence for a best-hit DNA sequence were shorter than 250 amino acid long or contained an interrupting stop codon or a frameshift, the sequence was discarded.Next, we constructed a phylogenetic tree using the translated amino acid sequences of the remaining sequences together with the 128 query sequences and identified V2R and T1R genes according to the tree topology.Hystricomorpha V2R genes identified here were further classified into seven V2R gene clades, AH, A1-5, A8, A10, B, C, and D. The clade names of Hystricomorpha V2R genes were according to the nomenclature of mouse V2R genes (Silvotti et al. 2011) which formed a monophyletic clade with these genes, with the exception of AH clade containing only Hystricomorpha genes.Next, we identified putative exon-intron boundaries that meet the "GT-AG rule," which states that the first two and the last two nucleotides of introns are GT and AG, respectively, for each sequence.We did not consider noncanonical splice sites for 2 reasons: (i) all 121 exon 3 sequences of mouse V2R genes adhere to the GT-AG rule, and (ii) the overwhelming majority (>98.7%) of splice sites in mammalian genes is reported to follow the GT-AG rule (Burset et al. 2000).We constructed a multiple alignment for each clade separately with mouse query sequences.Each of the sequence was elongated to find putative exon-intron boundaries that met the GT-AG rule within a 15-nucleotide region from the end of the best-hit sequence.If there were two or more positions that met the GT-AG rule at either end of the sequence, the most appropriate position was adopted by visual inspection.The sequence lacking an appropriate exonintron boundary meeting the GT-AG rule at either end was discarded, and the remaining sequences were regarded as intact sequences of the exon 3 of V2R genes.Exon 3 sequences of T1R genes were identified in a similar manner.After excluding intact V2R sequences, the sequences that best-hit to the mouse V2R queries with an e-value less than 1e −20 and that resided in contigs not shorter than 1 kb were regarded to be V2R pseudogenes.

Construction of Neighbor-Joining Phylogenetic Trees
A neighbor-joining (NJ) tree (Saitou and Nei 1987) was constructed with Poisson correction (PC) distance using the LINTREE program (Takezaki et al. 1995).Multiple alignments of translated amino acid sequences were generated by the program MAFFT (Katoh and Standley 2013).
In the process of OR gene identification, we used 781 consensus sequences, each of which was constructed from a multiple alignment of all intact OR genes contained in each of the 781 placental-OGGs, as queries of TBLASTN searches (Niimura et al. 2018) (see supplementary fig.S18, Supplementary Material online).Each hystricomorph intact OR gene "best-hit" to one of the 781 queries (Niimura 2013).Based on these facts, a given hystricomorph intact OR gene was assigned to the placental-OGG from which its best-hit query was generated.
Next, we constructed an NJ phylogenetic tree for each placental-OGG separately using all hystricomorph intact OR genes assigned to the OGG and all intact OR genes from 13 placental mammals belonging to the OGG identified in Niimura et al. (2014).We also constructed a phylogenetic tree for placental-OGG using all hystricomorph OR genes in the OGG together with the intact OR genes only from mice and rats, rather than the 13 species.We then examined whether the hystricomorph OR genes were monophyletic or paraphyletic in each phylogenetic tree.When all hystricomorph OR genes assigned to a given placental-OGG formed a monophyletic clade, the hystricomorph OR genes were regarded as members of the candidate hystricomorph-OGG.Phylogenetic trees showing paraphyly of hystricomorph genes were further examined.From each of the Hystricomorpha-paraphyletic trees, we extracted phylogenetic clades that met either of the following conditions: (i) a clade that was supported with a >70% bootstrap value and contained a part (not all) of hystricomorph OR genes assigned to the OGG and at least one nonhystricomorph genes or (ii) a clade with a >70% bootstrap value that contains a part (not all) of hystricomorph OR genes assigned to the OGG, and the tree topology suggests that the divergence of the clade occurred before the separation of Hystricomorpha and non-Hystricomorpha with a >70% bootstrap support.The hystricomorph genes included in such a clade were regarded as members of the candidate hystricomorph-OGG.For each placental-OGG, the genes included in such a clade were excluded, and a phylogenetic clade was constructed again.This process was iteratively performed until no such clades were extracted and the remaining hystricomorph genes were regarded as members of another candidate hystricomorph-OGG.
Finally, for each of the candidate hystricomorph-OGGs, an NJ phylogenetic tree was constructed using only hystricomorph OR genes assigned to the OGG.For each phylogenetic tree, we examined whether OR genes from the common gundi, the most basal species examined in the Hystricomorpha phylogeny, showed monophyly or paraphyly in each phylogenetic tree.When a tree consisted of two clades, each of which contained both gundi and nongundi genes, and the separation of the two clades was supported with a >70% bootstrap value, the candidate hystricomorph-OGG was subdivided into two hystricomorph-OGGs.All constructed phylogenetic trees were visually inspected to confirm that the hystricomorph-OGGs were appropriately identified.

Identification of OGGs among Hystricomorph V1R, V2R, and T2R Genes
We classified 1,237 intact V1R genes identified from 17 Hystricomorpha species into OGGs among Hystricomorpha.We first constructed an NJ phylogenetic tree using all 1,237 V1R genes and 74 representative mouse V1R genes from Young et al. (2010) that were used as queries for TBLASTN searches.We found that these V1R genes were clearly classified into 2 groups: one containing 649 genes that corresponds to clades A, B, C, H, I, J, and K and the other containing 662 genes that corresponds to clades D, E, F, and G, according to the nomenclature of Rodriguez et al. (2000) and Grus et al. (2005).We then constructed a phylogenetic tree for each of the two groups separately and subdivided the genes included in a given tree into smaller groups at the node supported with a high (>90%) bootstrap value.We iteratively performed these processes and identified OGGs among Hystricomorpha by visual inspection in the same criteria as those for OR genes.
For T2R genes, we first constructed an NJ phylogenetic tree using all 517 intact T2R genes identified from 17 Hystricomorpha species together with 24 human and 36 mouse T2R genes from Hayakawa et al. (2014) that were used as queries for TBLASTN searches.Most hystricomorph genes form monophyletic clades with some of the human and/or mouse T2R genes with >95% bootstrap values, showing clear orthologous relationships to human/ mouse T2R genes.Hystricomorph genes that did not show clear orthology to human/mouse T2R genes were extracted, and a phylogenetic tree was constructed using these genes only.By visual inspection of the tree, OGGs among Hystricomorpha were determined using the same criteria as those for the OR genes.
Exon 3 sequences of V2R genes were also classified into OGGs.Each of the clades A1-5, A8, A10, B, C, and D corresponds to an OGG, while hystricomorph-specific AH clade was divided into two OGGs according to the criteria used for the identification of OGGs of OR genes.

Estimation of the Numbers of Gene Gains and Losses
The number of gene gains and losses in each branch of Hystricomorpha phylogeny was estimated using the reconciled tree method with a 70% bootstrap value for the Evolution of Chemosensory Receptor Genes in Hystricomorph Rodents • https://doi.org/10.1093/molbev/msae071MBE threshold of reconciliation (Niimura and Nei 2007).We calculated the numbers for each OGG separately, without using any outgroup sequences, as described in Niimura et al. (2014).The results for all the OGGs were compiled to generate Fig. 6.
The rate of gene gain per gene and that of gene loss per gene was calculated as follows: For each branch, the rate of gene gain/loss was calculated by dividing the number of gene gains/losses at the branch by the number of genes in the ancestral node to which the branch connects.The means were calculated among the values for all branches in the Hystricomorpha phylogeny for gene gains and losses separately.

Fig. 3 .
Fig. 3. Expansion of OR genes within OGG2-27 in Hystricomorpha evolution.a) A histogram illustrating the number of OR genes belonging to each of the 654 OGGs.OGG2-27 is denoted by an arrowhead.b) A phylogenetic tree for 363 hystricomorph OR genes within OGG2-27.The color codes are shown on the right.The scale bar represents the number of amino acid substitutions per site.c) Depiction of gains and losses of OR genes within OGG2-27 in each branch of the Hystricomorpha evolution.The boxes on external and internal nodes represent the number of intact genes within OGG2-27 in each extant species and estimates of functional genes in the ancestral species, respectively.

Fig. 5 .
Fig.5.Gains and losses of OR, V1R, V2R, and T2R genes during Hystricomorpha evolution.The boxes on external and internal nodes represent the number of intact genes in each extant species and estimates of functional genes in the ancestral species, respectively.Additionally, it displays the estimated numbers of gene gains and losses in each branch of the Hystricomorpha phylogenetic tree.

Fig. 6 .
Fig. 6.Correlation in the numbers of gene gains (a) and gene losses (b) per branch of the Hystricomorpha evolution among OR, V1R, V2R, and T2R genes.The histograms within squares on a diagonal line represent the distribution of the numbers of gene gains (a) or losses (b).The numbers within squares at the top right display Spearman's rank correlation coefficient.*P < 5%; **P < 1%; ***P < 0.1%.The graphs within squares at the bottom left indicate the scatterplot with a regression line.

Table 1
Five multigene families of GPCRs for mammalian chemosensory systems